Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

port affine tests #7708

Merged
merged 9 commits into from
Jun 30, 2023
Merged

port affine tests #7708

merged 9 commits into from
Jun 30, 2023

Conversation

pmeier
Copy link
Collaborator

@pmeier pmeier commented Jun 28, 2023

@pytorch-bot
Copy link

pytorch-bot bot commented Jun 28, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/7708

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures

As of commit 980e77b:

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@@ -162,7 +165,7 @@ def _check_dispatcher_dispatch(dispatcher, kernel, input, *args, **kwargs):
if isinstance(input, datapoints._datapoint.Datapoint):
# Due to our complex dispatch architecture for datapoints, we cannot spy on the kernel directly,
# but rather have to patch the `Datapoint.__F` attribute to contain the spied on kernel.
spy = mock.MagicMock(wraps=kernel)
spy = mock.MagicMock(wraps=kernel, name=kernel.__name__)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QoL

Comment on lines +479 to +481
("dtype", "make_mask"), [(torch.uint8, make_segmentation_mask), (torch.bool, make_detection_mask)]
)
def test_kernel_mask(self, dtype_and_make_mask):
dtype, make_mask = dtype_and_make_mask
def test_kernel_mask(self, dtype, make_mask):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QoL

@@ -744,7 +746,7 @@ def _make_input(self, input_type, *, dtype=None, device="cpu", spatial_size=(17,
@pytest.mark.parametrize("dtype", [torch.float32, torch.uint8])
@pytest.mark.parametrize("device", cpu_and_cuda())
def test_kernel_image_tensor(self, dtype, device):
check_kernel(F.horizontal_flip_image_tensor, self._make_input(torch.Tensor))
check_kernel(F.horizontal_flip_image_tensor, self._make_input(torch.Tensor, dtype=dtype, device=device))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't spot that before

Comment on lines +790 to +799
(torch.Tensor, F.horizontal_flip_image_tensor),
(PIL.Image.Image, F.horizontal_flip_image_pil),
(datapoints.Image, F.horizontal_flip_image_tensor),
(datapoints.BoundingBox, F.horizontal_flip_bounding_box),
(datapoints.Mask, F.horizontal_flip_mask),
(datapoints.Video, F.horizontal_flip_video),
],
)
def test_dispatcher_signature(self, kernel, input_type):
check_dispatcher_signatures_match(F.resize, kernel=kernel, input_type=input_type)
check_dispatcher_signatures_match(F.horizontal_flip, kernel=kernel, input_type=input_type)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to pay more attention when copy-pasting / reviewing. This went undetected.


return input

def _adapt_fill(self, value, *, dtype):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will likely need to be factored out later, since we need this for every transform that supports fill.

@@ -949,18 +878,6 @@ def test_correctness_crop_bounding_box(device, format, top, left, height, width,
torch.testing.assert_close(output_spatial_size, spatial_size)


@pytest.mark.parametrize("device", cpu_and_cuda())
def test_correctness_horizontal_flip_segmentation_mask_on_fixed_input(device):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot about this last time

@pmeier pmeier marked this pull request as ready for review June 28, 2023 21:44
@pmeier pmeier requested a review from NicolasHug June 28, 2023 21:44
Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The affine tests run in 1.5 minutes on my laptop. Conversely, the Resize tests take 5 seconds (I understand there's less combinations to check there).

Can we do anything to reduce the duration of these tests?

@NicolasHug
Copy link
Member

Can we do anything to reduce the duration of these tests?

A significant portion of these tests are tests that:

  • are extremely unlikely to be go red once we've checked that they're green
  • do not need to be checked on every single PR. We probably don't even need to check them every single day.

I'm thinking in particular about tests like check_dispatcher*(), check_transform(), and probably others.
Should we start thinking of a test job that would run those on a regular basis, but not on PRs byt default? Something similar to test_dataset_download.py? We can also add some kind of GH label that woud trigger those tests on a PR, if explicitly requested.

@pmeier
Copy link
Collaborator Author

pmeier commented Jun 29, 2023

Regarding the long test time for this class: test_kernel_image_tensor contributes the largest part with its massive parametrization:

❯ pytest --co -q test/test_transforms_v2_refactored.py::TestAffine | tail -n1
40899 tests collected in 3.50s
❯ pytest --co -q test/test_transforms_v2_refactored.py::TestAffine::test_kernel_image_tensor | tail -n1
35200 tests collected in 2.74s
❯ pytest -q test/test_transforms_v2_refactored.py::TestAffine::test_kernel_image_tensor | tail -n1
35200 passed in 104.84s (0:01:44)
❯ pytest -q test/test_transforms_v2_refactored.py::TestAffine -k "not test_kernel_image_tensor" | tail -n1
5699 passed, 35200 deselected in 19.33s

Second to that is test_kernel_bounding_box:

❯ pytest -q --co test/test_transforms_v2_refactored.py::TestAffine::test_kernel_bounding_box | tail -n1
4800 tests collected in 2.09s
❯ pytest -q test/test_transforms_v2_refactored.py::TestAffine::test_kernel_bounding_box | tail -n1
4800 passed in 17.18s

If we can bring both tests down to a more reasonable duration, the total time of TestAffine is not too bad in general:

❯ pytest -q test/test_transforms_v2_refactored.py::TestAffine -k "not test_kernel_image_tensor and not test_kernel_bounding_box" | tail -n1
899 passed, 40000 deselected in 6.10s
❯ pytest -q test/test_transforms_v2_refactored.py::TestResize | tail -n1
767 passed in 3.23s

To achieve that, we could move away from the full matrix of parameters. Meaning, we will fix most of the parameters and only parametrize over one or a few at the same time. This will drastically cut down the overall amount of tests, while only leaving minimal gaps that will likely never result in a bug. Note that we are not talking about the correctness tests here. I'll send a commit to implement this strategy.

None of that is orthogonal to running test on a schedule. However, implementing that will take more time and thus I'll go for the low hanging fruit first.

@pmeier
Copy link
Collaborator Author

pmeier commented Jun 29, 2023

After 39b91c8

❯ pytest -q --co test/test_transforms_v2_refactored.py::TestAffine | tail -n1
1287 tests collected in 0.25s
❯ pytest -q test/test_transforms_v2_refactored.py::TestAffine | tail -n1
1287 passed in 4.32s
❯ pytest -q test/test_transforms_v2_refactored.py::TestAffine -k "not cuda" | tail -n1
1087 passed, 200 deselected in 3.28s
❯ pytest -q test/test_transforms_v2_refactored.py::TestAffine -k "cuda" | tail -n1
200 passed, 1087 deselected in 2.76s

I can live with that without going for a schedule solution for now.

@pmeier pmeier requested a review from NicolasHug June 29, 2023 11:32
Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Philip, a few comments but LGTM anyway

test/test_transforms_v2_refactored.py Outdated Show resolved Hide resolved
test/test_transforms_v2_refactored.py Outdated Show resolved Hide resolved
test/test_transforms_v2_refactored.py Outdated Show resolved Hide resolved
)
@pytest.mark.parametrize("fill", _CORRECTNESS_FILL)
@pytest.mark.parametrize("seed", list(range(10)))
def test_transform_image_correctness(self, center, interpolation, fill, seed):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can't unify this test with the functional one, because the parameters are too different (i.e. they're randomly sampled in the transform)?

That being said I'm curious why the MAE threshold are different in both test

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can't unify this test with the functional one, because the parameters are too different (i.e. they're randomly sampled in the transform)?

Correct.

That being said I'm curious why the MAE threshold are different in both test

I just put the minimal needed threshold for the tests to pass. I guess in our case transform test sampled other affine params that ultimately led to a higher MAE. I'll unify them.

test/test_transforms_v2_refactored.py Outdated Show resolved Hide resolved
kwargs["degrees"] = 0

with pytest.raises(
ValueError if isinstance(value, list) else TypeError, match=f"{param} should be a sequence of length 2"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not worth addressing in this PR because it's not related to tests, but do you have any thought on unifying or parameter error types to always be ValueError? Distinguishing between ValueError and TypeError doesn't seem worth it and leads to that kind of special-casing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started out with the strong believe that one should differentiate between the two based on the documentation:

Passing arguments of the wrong type (e.g. passing a list when an int is expected) should result in a TypeError, but passing arguments with the wrong value (e.g. a number outside expected boundaries) should result in a ValueError.

However, in practice it turns out that this distinction is not really helpful. I doubt anyone is doing

try:
    torchvision.transforms.functional.affine(...)
except TypeError:
    # do something
except ValueError:
    # do something else

The most likely scenario is that no one is using the exceptions in their regular flow, but rather only if there is actual an error. And in that case a human is going to look into it and thus the message is much more important.

So overall I see few reasons to differentiate the two in torchvision. That might be different for other packages, where the users actually rely on the exception behavior. However, it doesn't bother me enough to actually go and address it. I'll review / stamp a PR if you want to bring one.

As a side note: the distinction between TypeError and ValueError here is terrible IMO, because we even use the same error message.

@pmeier pmeier merged commit 0e49615 into pytorch:main Jun 30, 2023
@pmeier pmeier deleted the port/affine branch June 30, 2023 08:30
@github-actions
Copy link

Hey @pmeier!

You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

facebook-github-bot pushed a commit that referenced this pull request Jul 3, 2023
Reviewed By: vmoens

Differential Revision: D47186575

fbshipit-source-id: 93dcb9c8c60dc702870dfe6cc5c0d2a2a178a77c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants